Method and system for detecting and generating transient conditions in auditory signals

ABSTRACT

The shape of energy changes of an auditory signal is used for identifying or representing features which can be perceived by a human ear as representing a distinct sound picture. In order to extract information from the shape of the energy changes, the shape is preferably represented by the shape of a transient pulse of the signal with a rise time of at most 2 ms. It is preferred that an envelope detection is being used in order to obtain the transient signal pulse. The energy change representing the distinct sound picture can be a phoneme or vowel. The invention also relates to a method for identifying the energy changes in the auditory signal by comparing the shape of energy changes of the signal, which can be represented as the shape of the transient pulse, with predetermined energy change shapes representing distinct sound pictures. The invention also relates to a method of speech synthesis wherein a series of transient pulses is generated corresponding to the series of phonemes to be synthesized. The invention further relates to a system for processing an auditory signal in order to reduce the bandwidth of the signal with substantial retention of the information of the signal, the system comprising means for extracting the transient component of the auditory signal, and means for detecting an envelope the transient component.

The present invention relates to a method and system for signalprocessing, by which method and system features representing distinctsound pictures in auditory signals are extracted from transients in theauditory signals. The result of the processing may be used foridentification of sound or speech signals or for quality measurement ofaudio products or systems, such as loudspeakers, hearing aids,telecommunication systems, or for quality measurement of acousticconditions. The method of the present invention may also be used inconnection with speech compression and decompression in narrow bandtelecommunication.

In the prior art methods of signal analysing of auditory signals, thesignals are considered to be steady state over a short time of period,and a form of short time spectral analysis is used under thisassumption.

The human ear has the ability to simultaneously catch fast soundsignals, detect sound frequencies with great accuracy and differentiatebetween sound signals in complicated sound environments. For instance itis possible to understand what a singer is singing in an accompanimentof musical instruments.

In prior art methods of signal analysis and in the method of the presentinvention it is assumed that the cochlea in the human ear can beregarded as an infinite number of bandpass filters, IBP, within thefrequency range of the human ear.

The time response f(t) for one bandpass filter due to an excitation canbe separated into two components, the transient response, ft(t), and thesteady state response, fs(t),

    f(t)=ft(t)+fs(t).                                          (1)

Traditional signal processing is based on the steady state responsefs(t), and the transient response ft(t) is assumed to vanish very fastand to be without importance for the perception, see for example"Principles of Circuit Synthesis", McGraw-Hill 1959, Ernest 5. Kuh andDonald O. Pederson, page 12, lines 9-15, where it is stated that:

"only the forced response is considered while the response due to theinitial state of the network is ignored".

Thus, when students are introduced to the world of signal analysis, theylearn at a very early stage that the transient response, i.e. theresponse due to the initial state of the network should be ignoredbecause it vanishes within a very short period of time. Furthermore, itis rather difficult to analyse these transient signals by use oftraditional linear methods of analysis.

The ability of the human ear to hear very short sounds and at the sametime detect frequencies with great accuracy is in conflict with thetraditional filterbased spectrum analysis. The time window (twice therise time) of a bandpass filter is inversely proportional to thebandwidth,

    tw=2/(fu-fl)                                               (2)

where fl is the lower cutoff frequency and fu is the upper cutofffrequency.

Thus, if a rise time of 5 ms is required the consequence is that thefrequency resolution is no better than 400 Hz.

As the detection of these transients is in conflict with a highfrequency resolution, the detecting by the human ear of these transientsmust take place in an alternative manner. It has not been examined howthe human ear is able to detect these signals, but it might be possiblethat the cochlea, when no sounds are received, is in a position of rest,where the cochlea will be very broad-banded. When a sound signal isreceived, the cochlea may start to lock itself to the frequencycomponent or components within the signal. Thus, the cochlea may bebroad-banded in its starting position, but if one or more stablefrequencies are received the cochlea may lock itself to this frequencyor these frequencies with a high accuracy.

Today it is known that the nerve pulses launched from the cochlea aresynchronized to the frequency of a tone if the frequency is less thanabout 1.4 kHz. If the frequency is higher than 1.4 kHz the pulses arelaunched randomly and less than once per cycle of the frequency.

Signal analysis based on filter bank spectrum analysis is disclosed inGB 2213623 which describes a system for phoneme recognition. This systemcomprises detecting means for detecting transient parts of a voicesignal, where the principal object of the transient detection is thedetection of a point where the speech spectrum varies most sharply,namely, a peak point. The detection of the peak points is used for amore precise phoneme segmentation. The transient analysis of GB 2213623is based on a spectrum analysis and the change in the spectrum, which isvery much different to the transient analysis of the present inventionwhich is based on a direct transient detection in the time domain.

The present invention is based on an approach which is different inprinciple from all known methods for analysing auditory signals.According to the invention it has been found that the signal informationrelevant to the identification of the auditory signal is present in thetransient component of the signal. Thus, the method of the presentinvention involves a separation of the transient component or responseof the auditory signal, a generation of a transient pulse correspondingto the transient component, and analysis of the shape of the pulse. Inan auditory signal, the corresponding transient pulse may be repeatedwith time intervals, and the time interval of these periodic transientpulses is normally also analysed or determined.

In real life the human ear reacts to energy changes at high frequenciesin order to recognize phonemes or sound pictures. But in the presentmethod transient pulses corresponding to the energy changes observed bythe ear are extracted at these high frequencies, whereafter thetransient pulses preferably are transformed to the low frequency rangestill maintaining the distinct features of the sound pictures orphonemes. Thus, by using the principles of the invention, it is possibleto obtain distinct features within auditory signals by examining thetransformed low frequency signals.

As will be understood from the following explanation of the method ofthe invention, the concept of extracting transient waveforms or shape ofpulses makes it possible to use pre-process methods which are muchsimpler than the best designs presently used and at the same time obtainmuch more valuable information with respect to the auditory inputsignals.

In its broadest aspect, the invention relates to the use of the shape ofenergy changes of an auditory signal for identifying or representingfeatures which can be perceived by an animal ear such as a human ear asrepresenting a distinct sound picture.

Before entering into a more detailed explanation of features of themethod of the invention, a few definitions will be given:

In short time analysis the transient component in a signal is a matterof definition. The idea is to obtain an expression that gives a responsecorresponding to the response in the cochlea to an abrupt change in thesignal energy. An abrupt change in the signal energy corresponds to thetransient component in the auditory signal. Thus, in the presentcontext, the term "transient component" designates any signalcorresponding to an abrupt energy change in an auditory signal. Thetransient component holds the signal information to be analysed and inorder to analyse this information the transient component may betransformed to a corresponding transient pulse having a distinct shape.Thus, in the present context, the term "transient pulse" refers to apulse having a distinct shape and substantially holding the informationof the transient component of the auditory signal and thus correspondingto an abrupt change in the energy of the auditory signal. As mentionedabove the transient part of a sound signal may be repeated with timeintervals and thus, in the present context, the term "periodic" whenused in combination with a transient component, response or pulsedesignates any transient component, response or pulse being repeatedwith intervals.

The term "shape" designates any arbitrary time-varying function (whichis time-limited or not time-limited) and which, within a given timeinterval Tp has a distinctly different amplitude level in comparisonwith the amplitude level outside the interval. Thus, Tp is the durationof the shape function when the shape function is time-limited, or theduration of the part of the function which has a distinctly differentamplitude level in comparison with the amplitude level outside the timeinterval. As will be understood, the identification of the shape of apulse is suitably performed by observing the amplitude of the pulsealong the time axis of the pulse.

In order to extract information from the shape of the energy changes,one broad aspect of the invention relates to represent the shape of theenergy changes by the shape of a transient pulse of the signal. However,several methods can be applied in order to obtain a transient pulsecorresponding to the change in energy, but is is preferred that anenvelope detection is being used, where the envelope preferably shouldbe detected from a transient response of the energy change in theauditory signal.

The energy change representing the distinct sound picture can be aphoneme or vowel or any other sound which gives a sudden energy changein an auditory signal.

It is also an aspect of the invention to provide a method foridentifying, in an auditory signal, energy changes which can beperceived by an animal ear such as a human ear as representing adistinct sound picture, the method comprising comparing the shape ofenergy changes of the signal with predetermined energy change shapesrepresenting distinct sound pictures. For the identification it ispreferred that the shape of the energy changes are represented by theshape of a transient pulse of the signal, and it is furthermorepreferred that the shape of the transient pulse should be obtained by anenvelope detection of a transient response of the energy change in theauditory signal.

The invention also relates to a method for processing an auditory signalso as to reduce the bandwith of the signal with substantial retention ofthe information of the signal, comprising extracting the transientcomponent of the auditory signal and detecting an envelope of thetransient component. It is preferred that transient pulse shapes of thesignal which can be perceived by an animal ear such as a human ear asrepresenting a distinct sound picture are identified.

It should be noted that the pulse rise time or the form of the leadingedge, the duration of the pulse, and the fall time or the form of thelagging edge are all important features for identification of the pulse.In a preferred embodiment of the invention the shape of the leading edgeof a pulse is identified, and it is also preferred that the shape of theleading edge is determined by determining rise time, slope and/or slopevariation of at least part of the leading edge.

In a preferred embodiment of the invention, the rise time, slope and/orslope variation of at least the top part of the leading edge isdetermined, since the upper part of the pulse should contain thenecessary information. The top part may be defined as the part beginningsubstantially at a point where the slope is maximum. The top part mayalso be the part corresponding to the upper 50% of the amplitude of thepulse.

When determining the shape of the pulse several metods may be used, butin a preferred embodiment the rise time, slope and/or slope variation ofthe leading edge is determined on the basis of at least 5 samples.However any other suitable number of samples may be used. Anotherpreferred method of identification of the shape of the leading edge maybe performed using comparison with a library of references. Here, thereferences with which comparison is made could be selected on the basisof the rise time of the leading edge.

It is also preferred to perform an identification of the duration of thepulse, where the duration of a pulse can be determined as the distancefrom the leading edge to the lagging edge at a predetermined amplitude.

As should be understood, it is also preferred to identify the shape ofthe lagging edge of the transient pulse.

The method of the present invention provides an expression for thetransient conditions of the auditory signal. The method comprises abandpass filtration of an auditory signal within the frequency range ofthe human ear and a detection of a lowpass filtered envelope, whichenvelope then can be analysed with known methods of signal analysis. Theenvelope is an expression of the transient part of the signal.

The known method of signal analysis, which should be used when analysingthe envelope, and the characteristics of the bandpass filter, whichshould be selected, will depend on the purpose of the analysis. Thepurpose may be speech recognition, quality-measurement of audio productsor acoustic conditions, and narrow band telecommunication.

The invention also relates to a system for processing an auditory signalto reduce the bandwith of the signal with substantial retention of theinformation of the signal, comprising means for extracting the transientcomponent of the auditory signal, and means for detecting an envelope ofthe transient component.

Embodiments and details of the system appear from the claims and thedetailed discussion of embodiments of the system given in connectionwith the figures and a mathematical description of an embodiment of thesystem.

The invention will now be described in further detail in connection witha mathematical description of the principle of the invention and inconnection with the drawing.

FIG. 1 shows the spectre of a bandpass filter F(ω) and a lowpass filterH(ω),

FIG. 2 shows the zeros and the poles in the s-plane for an infinitenumber of bandpass filters, IBP, having identical bandwidth,

FIG. 3 shows the zeros and poles in the s-plane for an infinite numberof bandpass filters, IBP, having identical Q,

FIG. 4 illustrates the impulse response for various root locations inthe s-plane,

FIG. 5 shows a spectrogram for the words "linear prediction",

FIG. 6 illustrates how a summation of an infinite number of bandpassfilters, IBP,. can be performed by one bandpass filtration,

FIG. 7 illustrates the principle of a transient detection systemaccording to the invention,

FIG. 8 shows a block diagram for a transient detection system accordingto the invention,

FIG. 9 shows the characteristics of a preferred highpass filter to beused in the system of FIG. 8,

FIG. 10 shows the characteristics of a preferred lowpass filter to beused in the system of FIG. 8,

FIG. 11 illustrates the sensitivity of the human ear,

FIG. 12 illustrates average formant frequencies for the American vowels/i(:)/, /ae(:)/, /a(:)/, and /u(:)/,

FIGS. 13a to 13p show the experimental results of the first transientanalysis of the vowels of FIG. 11,

FIGS. 14a to 14c show processed curves of the vowel "e" as in "heat",

FIGS. 15a to 15c show similar curves as in FIG. 12 for the vowel "o" asin "hop",

FIGS. 16a to 16b show normalized time windows for the processed curvesof the vowel "e" as in "heat",

FIG. 17a to 17b show normalized time windows for the vowel "o" as in"hop",

FIG. 18 shows normalized time windows for the vowel "a" as in "have",

FIG. 19 shows a block diagram for a speech recognition system accordingto the invention, and

FIGS. 20-25 show transient pulses for speech synthesis of the phonemes"i" as in "heat", "o" as in "hop", "o" as in "ongaonga", "u" as in theDanish word "hus", ".o slashed." as in the Danish word ".o slashed.se",and "y" as in the Danish word "lys", respectively.

First, a mathematical explanation of the principles of the invention isgiven.

A bandpass filter may be represented in the time domain by an impulseresponse and can be expressed as

    f(t)=h(t)cos(ω.sub.c t)                              (3)

where h(t) is the impulse response for a lowpass filter and ω_(c) is thecentre frequency of the bandpass filter f(t). The term cos(ω_(c) t) maybe regarded as representing a frequency shift of the lowpass filter to abandpass filter with a centre frequency at ω_(c). This is illustrated inFIG. 1, where F(ω) and H(ω) are the corresponding frequencycharacteristics of f(t) and h(t).

Let the IBP filters be composed of a simple bandpass filter, BP, with azero at origin and two complex poles (complementary) in the left halfplan of the complex s-plane and let the poles of the IBP filters beplaced in a straight line then:

1) If the bandwidth is identical for all the IPB filters then the risetime and the delay time will be identical for all filters butQ=fc/(fu-fl) will be inversely proportional to the centre frequency fc.The zeros and the poles are shown in FIG. 2.

2) If Q is identical for all filters then the rise time and the delaytime will be inversely proportional to the centre frequency while thebandwidth will be proportional to the centre frequency. The zeros andthe poles are shown in FIG. 3.

It is assumed that the rise time and the delay time are identical forthe IBP filters within the frequency range which is of interest for theanalysis of the transient conditions. If this is not the case it isassumed that the brain will compensate for it. The effect is only thatthe rise time will be slower and the delay time will be longer withfalling frequencies (if Q is identical). The rhythm and the shape of thetransients will be the same.

In short time analysis the transient component in a signal is a matterof definition. The idea is to get an expression that gives a responsecorresponding to the response in the cochlea to an abrupt change in thesignal energy. An abrupt change in the signal energy corresponds to thetransient component in the auditory signal.

The composition of the transient and the steady state component in asignal may be identified by envelope detection, where the steady statecomponent is the DC component in the detected envelope and the transientcomponent is identified as the changes in the level of the envelope.

The transient response may be identified by envelope detection.

The envelope of the impulse response can be expressed as

    ft(t)= f(t).sup.2 +.sup.2 !.sup.1/2                        (4)

where is the Hilbert transform of f(t).

By substituting (3) into (4) we have

    ft(t)={ h(t)cos(ω.sub.c t)!.sup.2 +.sup.2 }.sup.1/2  (5)

For the Hilbert transform we have

    u=v(t)=u(t)                                                (6)

if the spectra for u(t) and v(t) do not overlap.

Hence we have

    ft(t)={ h(t)cos(ω.sub.cs t)!.sup.2 + h(t)sin(ω.sub.c t)!.sup.2 }.sup.1/2                                                 (7)

and

    ft(t)=|h(t)|                             (8)

based on the assumption that the spectrum for h(t) does not overlap thecentre frequency ω_(c). Under this condition the envelope of the impulseresponse is independent of the centre frequency. This is illustrated inFIG. 4 which shows how different impulse responses will result in thesame envelope.

The result of (8) causes the total envelope for the IBP filters to bethe sum of the envelopes for the individual bandpass filters.

An accumulated transient response ftt(t) can thus be expressed bysumming ft(t). This summation can be expressed as ##EQU1## and

    ftt(t)=|h(t)|(ω.sub.cu -ω.sub.cl),(10)

where ω_(cl) is the centre frequency for the lower IBP filter and ω_(cu)is the centre frequency for the upper IBP filter:

FIG. 5 shows a spectrogram for the words "linear prediction" whenpronounced by a man. The spectrogram is recorded with bandpass filterswith a bandwidth of 300 Hz and centre frequencies in the range fromabout 150 Hz up to about 4 kHz. The ordinate is the frequency, theabscissa is the time and the black ink is a degree of the signal energy.The horizontal oriented black bands are dominating frequency bands inthe speech and are called formants. The vertical thin lines correspondto abrupt energy changes and thus to the transient components of thesignal. A spectrogram is usually used for formant analysis and abandwidth of 300 Hz is not sufficient for transient analysis, but theappearance of the shape of the lines confirm that the transient signalis independent of the centre frequency of the bandpass filters.

As mentioned above the cochlea may be regarded as having an infinitenumber of bandpass filters, but it would be advantageous to be able todetect the transient signal without the use of a large number ofbandpass filters.

FIG. 6 illustrates how a summation of an infinite number of bandpassfilters, IBP, can be performed by one bandpass filtration, BP, having abandwidth that covers the cutoff frequencies of the lower and the upperIBP filter, IBP_(l) and IBPU. Preferably, the bandpass filter BP shouldbe of the maximum flat delay type, as this type of filter is well suitedfor preserving the shape of a transient condition.

In practice the simplest way to detect the envelope is to use arectifier and a lowpass filter, see for example "Communication Systems.An introduction to Signal and Noise in Electrical Communication",McGraw-Hill Kogakusha 1968, A. Bruce Carlson. From equation (10) it canbe seen that the accumulated transient component may be detected byperforming a highpass filtration, BP, covering the range of IBP thatneeds to be accumulated before the envelope detection. An envelopedetection corresponds to a frequency shift by the centre frequency ω_(c)of the bandpass filter to a lowpass filter with half the bandwidth ofthe bandpass filter. This means that the cutoff frequency of the lowpassfilter determines the bandwidth of all the IBP covered by the BP. Thisprinciple is illustrated in FIG. 7.

In FIG. 7 the digitalized sound signal S(t) enters a bandpass orhighpass filter BP, 10, the output of the bandpass filter is input intoa rectifying unit 11, the output of which is input into a lowpass filterLP, 12. The output of the lowpass filter 12 is designated ftt(t) andrepresents a detection of the envelope and thus a detection of thetransient response of the sound signal S(t).

From the mathematical definition of a transient part of a signal it canbe concluded that the poles of h(t) will be located on the negative reelaxis in the s-plane. This means that the impulse response will not beoscillating around zero (a transient response is a non oscillatingsignal). From equation (10) it can be seen that the limits ω_(cu) andω_(cl) for the IBP filters is only a question of quantity of ftt(t).

The bandpass filtration, BP, sets the limits for the summation of thetransient responses of the IBP filters, and the amplitude characteristicweights the contribution from the IBP filters. If a lowpass filter isused instead of BP, there will be an overlap of the spectrum for h(t)and the centre frequency for the lower IPB filter. The bandpass filterBP should have a band width which at least equals the double of thecutoff frequency of the lowpass filter LP. The band width and theamplitude characteristic can be utilized for optimizing different signalanalyses when using the method according to the invention.

In principle the poles of the lowpass filter LP should be located on thenegative reel axis for a mathematical transient detecting system.However, when dealing with auditory signals, it is the characteristic ofthe cochlea which is decisive; but there should preferably be nosignificant oscillations within the impulse response, as this could makethe transient conditions of the auditory signal more indistinct.

The cutoff frequency of the lowpass filter LP is an expression for thetransient conditions of the signal, and this frequency should inconnection with auditory signals result in a rise time corresponding tothe rise time of the cochlea. The cutoff frequency may be regarded as anindex of transients, where a low cutoff frequency will result intransient detection of only those signal elements having a slow risetime, and where a high cutoff frequency also will result in detection ofsignal elements having a fast rise time.

The fact that the nerve pulses from the ear are synchronized to thefrequency below about 1.4 kHz and not above indicates that the ear istone oriented below 1.4 kHz and transient oriented above. In thetransient oriented area the nerve pulses are synchronized to transients,corresponding to abrupt energy changes, in the signal.

The cutoff frequencies for the BP should correspond to the transientsensitive range for the cochlea (theoretically it should have anamplitude characteristic corresponding to the sensitive curve for theear). The sensitivity curve for the human hearing indicates that thelower cutoff frequency must be about 2 kHz and the upper about 5 kHz.The amplitude characteristic for the BP filter will weight thecontributions from the individual IBP filters.

From the above discussion a transient detection and analysis systemaccording to the invention may be constructed as shown in the blockdiagram of FIG. 8. In FIG. 8 a sound signal is input into a microphone13 the output of which is passed through a lowpass filter 14 beforebeing digitalized by an A/D converter 15. The output of the A/Dconverter S(t) is lead to a highpass or bandpass filter BP, 10, theoutput of the bandpass filter is input into a rectifying unit 11 theoutput of which is input into a lowpass filter LP, 12, see also FIG. 7.The output of the lowpass filter 12 is designated ftt(t) and representsthe transient components of the input signal. In order to analyse thetransient components, the output signal of the lowpass filter 12 shouldpreferably be lead into equipment for signal analysis or recognition 16.

FIGS. 9 and 10 show the characteristics of a preferred highpass filterand lowpass filter to be used in the systems of FIGS. 7 or 8. Thebandpass filter BP to be used as the highpass filter 10 in FIGS. 7 or 8should have a lower cutoff frequency of at least 2000 Hz, preferablyabout 3000 Hz. The upper cutoff frequency should be in the range between4500 and 7000 Hz, preferably about 6000 Hz. The characteristic shown inFIG. 9 has a lower cutoff frequency of 3014 Hz. The lowpass filter LP tobe used in FIGS. 7 or 8 should have a higher cutoff frequency in therange of 400-1200 Hz, preferably about 700 Hz. The characteristic shownin FIG. 10 has a higher cutoff frequency of 732 Hz. It would also bepossible to construct a transient detection system according to FIGS. 7or 8 by using a full-wave rectifier. However, it is preferred to use aone-way rectifier as illustrated in FIGS. 7 and 8.

In FIG. 11 the sensitivity of the human ear is illustrated as theresponse of the cochlea on auditory signals for tones is shown. Asalready mentioned the perception is tone oriented up to about 1.4 kHzand transient oriented above 1,4 kHz.

As mentioned above and. illustrated in FIG. 6 the total envelope for theIBP filters is obtained by a summation of the envelopes of theindividual bandpass filters, and the summation of an infinite or highnumber of bandpass filters IBP can be performed by one bandpassfiltration BP. This principle is used in the diagram shown in FIG. 7.However, a summation of a number of bandpass filters may also berealized by using a filter bank method in which the envelopes of anumber of individual bandpass filters are detected and summed. Thus,each branch within the filter bank is composed of a bandpass filter witha specific centre frequency, a rectifying unit and a lowpass filter, andthe outputs of the lowpass filters are summed in order to obtain thetotal envelope.

Now, some introductory experiments illustrated by FIGS. 12 and 13 willbe discussed:

Two experiments were carried out in order to evaluate the cutofffrequencies for the BP and the LP filters and to evaluate thesuitability of the method for speech recognition.

1. Experiment by listening to an amplitude modulated signal

To have a first indication of the cutoff frequency for the LP filterunder controlled conditions, a listening experiment was carried out withan amplitude modulated signal in the sensitive frequency range for theear. The experiment is somewhat artificial because normally there wouldnot be so intensive a signal in that range and it can not be recommendedto verify the experiment because it is very hard to the ear.

The carrier frequency was chosen to 3.5 kHz and the modulation tone wastuned up from a few Hz and upwards. Until 350-400 Hz the envelope signalsounds buzz. After that it sounds first like a hollow /u(:)/ and at 800Hz like a sharp /i(:)/. Above 800 Hz it was not possible to hear theenvelope signal. If the tone is increased further at a given point onewill hear different mixed tones.

The sound was of course dominated by the carrier frequency but it wasindicated that the cutoff frequency for the LP filter probably has to beless than 1-1.2 kHz.

The modulation index was about 0.75. When it is greater than 1, theintroduction of overtones can be observed.

2. Analysis of transient signals for four vowels

Selection of vowels:

FIG. 12 shows average formant frequencies for the American vowels/i(:)/, /ae(:)/, /a(:)/, and /u(:)/ as in heed, had, hod, and who'd formen, women, and children. These vowels represent a good dispersal amongvowels so they were selected to the experiment.

The vowels were recorded (with Danish accent) pronounced of a man, awoman, and a child by an ordinary cassette recorder.

Setup for the experiment:

An analog TSD (Transient Signal Detector) was designed in accordancewith FIG. 7. The design was based on the operational amplifier LM 833.

The specification for the filters were:

The BP filter was a four orders Chebyshev filter with 1 db ripple. Theupper cutoff frequency is about 6.5 kHz and the lower is adjustable fromabout 550 Hz to 2.6 kHz.

The rectifier was a-full rectifier that converts the negative signal andadds it to the positive signal.

The LP filter was a two orders Butterworth filter designed to have acutoff frequency at 1.5 kHz (the 3 db cutoff frequency was measured to2.1 kHz).

Recording vowels and detecting the transient signal:

Four vowels pronounced by a man, a woman, and a child were recorded onan ordinary radio cassette recorder. The transient signal was detectedby means of the TSD, converted, and stored on PC by means of an 8 bitsA/D converter. The sampling rate when recording was 10 kHz, but whenanalysing the recorded data only every second set of values wasconsidered, resulting in a sampling rate of 5 kHz. An 8 bits A/Dconverter gives a poor dynamic range and therefore it was necessary torecord the vowels isolated (that means not in a word) and this gives amore uncertain pronunciation.

FIGS. 13a-13p show the experimental results of the first transientanalysis of the vowels of FIG. 12.

It is possible to identify the vowel by listening to the transientsignal. By visual inspection of time variation of the results it couldbe observed that the same vowel pronounced by a man, a woman, and achild, respectively, was having almost the same characteristics, even ifdifferences in the fundamental tone were observed. When recording thevowel /a(:)/ as in the Danish word "op", a p-sound was also recordedwhich is clearly seen from the time variation of the transient signal.

Analysis of the transient signals:

The power in the transient signals varies a lot from vowel to vowel. Thesignals of the vowels /a(:)/ and /u(:)/ were very low (especially forthe man's voice) and it was necessary to turn up the volume for theradio cassette recorder to a high level and it caused a lot of noise.

First, there were made a number of FFT analysis of 20 ms duration and a5 Khz sampling rate at different starting points in the vowels. Thespectra appear to be very outstanding and identical throughout thevowel. This strongly indicates that there is important information inthe signal.

In order to analyse common features 20 ms (101 samples) were randomlychosen from each vowel. The time signals were smoothed by a Hammingwindow and the FFT's were calculated. In FIGS. 13a-13d the power spectraare shown where the three voices are illustrated in the same diagram foreach vowel and the corresponding transient signals are shown separatelyin FIGS. 13e-13h when pronounced by a woman, in FIGS. 13i-13l whenpronounced by a man and in FIGS. 13m-13p when pronounced by a child.

The spectra are expected to have the following features:

The spectra of the same vowel pronounced by three different voices willhave some common features related to the vowel and some features relatedto the voice.

The spectra of different vowels pronounced by the same voice will havesome features related to the different vowels and some common featuresfrom the voices.

Furthermore, it must be expected that the shape of the spectra plays amore important part than the absolute frequencies.

From the power spectra the following can be seen:

/i(:)/ (FIG. 13a)

The most remarkable feature is that the spectra from all three have anoutstanding top in the frequency range from 300-400 Hz, they are 50 Hzwide, and there are an outstanding cleft at 200-250 Hz. Furthermore,there is a contribution at 50 Hz. The man's voice has a contribution at150 Hz which must attribute to a deep voice.

/ae(:)/ (FIG. 13b)

The voices of the woman and of the man have an outstanding cleft at 350Hz (deeper than 50 db). The mans voice has also in this case acontribution at 150 Hz. The voice of the child does not fit so well intothe pattern, this might perhaps be due to an uncertain pronunciation.

/a(:)/ (FIG. 13c)

All three voices have top 250-300 Hz. The frequency range is a bit lowerand not so outstanding as for the /i(:)/. Further, there is majorcontribution at 50 Hz and below for all three voices.

/u(:)/ (FIG. 13d)

The voices of the child and of the woman are real alike and they have apeak at 300 and 350 Hz and they have a deep wide valley at 100 Hz. Theman's voice has also a peak and the valley is as wide as it is for thewoman and the child but not so deep. The reason for this can be the deepvoice and the fact that there is a lot of noise in the signal caused bythe radio cassette recorder.

The experiments leading to the results of FIGS. 13a-p can be seen asintroductory but the results are highly interesting especially whentaking into consideration the simple equipment that has been used with alot of noise and only 8 bit A/D-converter. In spite of this the resultsare outstanding. There has been no particular data selection to improvethe results and there is therefore no doubt that the transient conditionis of decisive importance for speech recognition.

It seems like all information might be located in the frequency rangebelow 500 Hz. If this is the case-then the demand on the samplingfrequency will be less than 1.5 kHz and it will be possible to analysethe speech signal very intensively with more parallel processes. It ispossible to have more time windows for instance 5, 20, and 40 ms and usespectrum analysis (FFT, LPC, CEPSTRUM, or others) to detect somephonemes and time analysis (correlation or methods) to detect othersphonemes.

It is most likely that a more sophisticated design of the TSD with anAGC amplifier as preamplifier and a logarithmic or AGC amplifier afterthe BP filter in order to compensate for variations in the energy of thebandpass filtered phonemes, will allow very good results to be obtainedand cause a very robust speaker independent speech recognition. Betterresults may be obtained if a 12 or 16 bit A/D converter is used insteadof the 8 bit A/D converter.

Further experimental results illustrated in FIGS. 14-18 will bediscussed in the following:

The method of extracting transient signal components according to thepresent invention may also be regarded as a pre-process of the auditoryinput signal. In order to be able to obtain a better understandingand/or determination of the parameters of the pre-process a softwareprogramme were developed, by use of which it is possible to show theoutput signals and listen to the outcome after each process step of thepre-process.

The analysis of speech signals shown in FIGS. 14 and 15 has been made bymeans of this software programme running on a Compaq Deskpro 4/66i PC.This type of PC is provided with Microsoft Windows Sound System, amicrophone and a codec chip (AD1848) from Analog Devices. The codec chipperforms the sampling, the anti aliasing filtration and the A/Dconversion.

The speech signals shown in FIGS. 14a and 15a are recorded by means ofthis Sound System. The speech signal is sampled with 11025 kHz and 16bits linear PCM. The passband is greater than 4.9 kHz.

Pretransient signals are shown in FIGS. 14b and 15b. These signals arethe speech signals filtered by a third order IIR digital highpass filterwith a cutoff frequency at 3.0 kHz. The filter is a bilineartransformation of a third order Butterworth filter.

The cutoff frequency at 3.0 kHz has been chosen to get the bandpass inthe range of the most sensitive area of the cochlea. In this case itmeans from 3.0 kHz to 4.9 kHz, where 4.9 kHz is given by the codec chip.The high- or bandpass filter will be optimal if it has maximum flatdelay characteristic in accordance with equation (10).

The transient signals shown in FIGS. 14c and 15c are the pretransientsignal rectified and filtered by a second order IIR digital lowpassfilter with a cutoff frequency at about 700 Hz. The filter is a bilineartransformation of a second order Butterworth filter.

The lowpass filter shall preserve the shape of the transient pulsecorresponding to a transient response in the cochlea, so that a filterwhich is able to do this will be an optimal filter. The nerves in thecochlea are able to launch nerve pulses with a frequency up to about 1.4kHz. A bandwidth for the IBP filters in the transient oriented area at1.4 kHz are transformed by the envelope detection to a cutoff frequencyfor a lowpass filter at 700 Hz, which is the reason why a cutofffrequency at about 700 Hz has been chosen.

The transient signal may be regarded as an expression for the energychange in the signal.

All the signals presented in FIGS. 14 and 15 are normalized to a maximumsignal level, which means that the largest absolute signal value isequal to 32766. The abscissas in FIGS. 14 and 15 represent a timeinterval of 50 ms and the ordinates in FIGS. 14a, 15a and FIGS. 14b, 15brepresent the sound pressure of the corresponding speech signal whereasthe ordinates of FIGS. 14c, 15c represent the energy of thecorresponding transient speech signal.

It is possible to listen to the speech, the pretransient and thetransient signals, corresponding to FIGS. 14a, 15a, 14b, 15b and 14c,15c, respectively. One of the main demands for selecting the filtercharacteristics is that the signals have to maintain a sound which isclose to the original speech signal when listening to the abovementioned signals.

Referring to the system illustrated in FIG. 7, FIG. 14 shows curves ofthe vowel "e" as in "heat", when pronounced by a man, where (a) showsthe speech signal before filtration is corresponding to the digitalizedinput signal S(t) in FIG. 7, (b) shows the signal after a highpassfiltration corresponding to the output signal of the bandpass filter 10in FIG. 7, and (c) shows the signal after rectifying and lowpassfiltering corresponding to the output signal of the lowpass filter 12 inFIG. 7.

FIG. 15 shows similar curves as in FIG. 14 for the vowel "o" as in"hop",

The rise and fall time and the width or duration of the transient pulseis observed to be of importance for the sound in a vowel. FIGS. 16-18give examples of measured transient pulses. The time window of the vowel"e" as in "heat", when pronounced by a man, shown in FIG. 16acorresponds to the processed signal shown in FIG. 14c. The correspondingtime window when the vowel "e" as in "heat" is pronounced by a child isshown in FIG. 16b. From FIGS. 16a and 16b it can be observed that theleading and lagging edges of the most dominant pulses are sharp with arise and fall time in about 0.4 ms or less and that the width of thedominant pulses is about 0.8 ms when measured at the level of about 50%.

The time window of the vowel "o" as in "hop", when pronounced by a man,shown in FIG. 17a corresponds to the processed signal shown in FIG. 15c.The corresponding time window when the vowel "o" as in "hop" ispronounced by a child is shown in FIG. 17b. From FIGS. 17a and 17b itcan be observed that the leading and lagging edges of the most dominantpulses are sharp with a rise and fall time in about 0.5 ms but the widthof the dominant pulses is about 1.5 ms when measured at the level about50%. The ditch in the dominant pulses of FIG. 17b is not deep enough toinfluence the perception. It should be noted that the vowel "o" as in"hop" is a sharp vowel, and a more soft vowel will have a more slowlagging edge.

FIG. 18 shows the time window for the processed signal of the vowel "a"as in "have" when pronounced by a man. It is to be observed that theshape of the transient pulse has softer leading and lagging edges thanthe pulses shown in FIGS. 16-17.

Thus, from the above results it may be concluded that the perception ofa vowel is given by the shape of the transient pulse. It is further tobe concluded that by analysing the transient components or pulses whichhave been extracted from the auditory signal by way of the abovementioned method of signal processing, the vowels or phonemes of thespeech signal may be recognised by identifying the shape of thetransient pulse or pulses.

In a vowel or phoneme the transient pulse is repeated and the repetitionfrequency gives the perception of the pitch. In FIG. 16a the time periodbetween two succeeding pulses is about 6 ms corresponding to a man'spitch at 170 Hz and in FIG. 16b the time period between two succeedingpulses is about 3.5 ms corresponding to a child's pitch at 280 Hz

Thus, it is also to be concluded that by analysing the transientcomponent or pulses which have been extracted from the auditory signalby way of the above mentioned method of signal processing, the pitch ofthe speech signal may be determined by determining the time periodbetween the transient pulses.

Thus, when analysing auditory signals according to a preferredembodiment of the present invention, it is taken into account that theidentity of the sound signal is preserved during the signal processingwhich includes a highpass filtration followed by a rectification and alowpass filtration of the input signal.

From the above discussion it should be understood that the presentinvention provides a method which is very suitable for use in speechrecognition.

FIG. 19 shows a block diagram for a speech recognition system accordingto the invention. In this system a pre-process unit 20 is provided whichcomprises the bandpass filter 10, rectifying circuit 11 and lowpassfilter 12 of FIG. 7. Thus, the pre-process unit, which most convenientlymay be integrated within a single integrated circuit or chip, is atransient detecting unit in accordance with the method of the presentinvention. The system further comprises units which are normally used inspeech recognition systems, such as a pattern recognition unit 21connected to a reference library 22, a unit for phoneme determination 23and a unit for word/sentence determination 24. The system shown in FIG.19 uses template matching but alternative approaches may be used in arecognition system.

The reference library 22 of FIG. 19 should store a library correspondingto the shapes which can be generated by the pre-process unit 20.

It should be understood that a single chip pre-process unit also maycomprise the lowpass filter 14 and or the A/D converter 15 as shown inFIG. 8.

It is to be understood that a pre-process according to the presentinvention could be used in many other electronic systems where speech orsound analysis, recognition, coding and/or decoding is required, such asquality measurement of audio products or systems, such as loudspeakers,hearing aids, and telecommunication systems, or for quality measurementof acoustic conditions. The pre-process may also be used in connectionwith speech compression and decompression in narrow bandtelecommunication.

As illustrated in FIG. 10 the preferred cutoff frequency of the lowpassfilter 12 used in a pre-process unit should be below 1 kHz. Thus, allthe necessary signal information of the auditory signals is representedwithin a rather narrow frequency range of 1 kHz. This should be comparedto the frequency band of around 9000 bits per second which is usedwithin the GSM mobile telecommunication system for the communication ofspeech signals. By using the pre-process method or unit of the presentinvention it should be possible to decrease the frequency band used fortelecommunication down to about 1000 bits per second which would resultin great savings within this area of communication.

Thus, it should be understood that the present method is very wellsuited for optimizing the bandwidth within narrow band telecommunicatonand it is within the scope of the invention that when transmitting anauditory signal in a telecommunication system, the signal should beprocessed by using the pre-process described herein before beingtransmitted and received by a receiver. It it preferred that prior totransmission of the processed signal, the signal is coded into a digitalrepresentation, and the coded signal is decoded in the receiver so as toreestablish transient pulse shapes perceived by the animal ear such asthe human ear as representing the distinct sound pictures of theauditory signal.

During the above mentioned digital transmission the bandwidth may bechosen so as to fulfil different requirements to the quality of thereceived, decoded and reestablished transient pulse. Thus, a bandwidthof at the most 4000 bits per second may be selected, but it should bepossible to obtain a good quality of the reestablished pulse by using abandwidth around 2000 bits per second. However, it is preferred that thebandwith is in the interval of 800-2000 bits per second. It is to benoted that for telecommunicating systems where a high system performanceis preferred as opposed to a high quality of the reestablished signal,such as for example in military systems, a bandwidth about 400 bits persecond may be selected.

When transmitting the digital signals it is preferred that the digitalinformation comprises information about leading edge, lagging edge, andduration of the transient pulse representing the processed auditorysignal. It is also preferred that a second and further pulses in asequence of identical pulses are represented by a digital signindicating repetition when transmitted.

It is also an object of the present invention to provide a method to beused in speech synthesis.

From the discussion of the experimental results of FIGS. 14-18 it shouldbe understood that the sound of each vowel or phoneme might be given bythe shape of a dominating transient pulse corresponding specifically tothat phoneme. From experiments it has been concluded that transientpulses similar to the processed pulses of FIGS. 16-18 hold the necessaryinformation in order to generate the sound of the phoneme.

By use of the software developed for the transient analysis illustratedin FIGS. 14-18 it is possible to create a simple transient signal byplacing points in a system of coordinates where the ordinate is theamplitude and the abscissa is the time in ms. One transient pulse may becreated by placing one or several points and interpolate a line betweenthe points either by a straight line or a sine curve and define aperiod. The signal is repeated for 300 ms and it is possible to listento the signal when converted by a D/A converter in the codec chip.

It should be noted that the pulse rise time or the form of the leadingedge, the duration of the pulse, and the fall time or the form of thelagging edge are all important features for identification,representation and/or generation of transient pulses for use in speechrecognition and/or synthesis. These features may also be used inconnection with speech compression.

This is illustrated in FIGS. 20-25 which show how transient pulses usedfor speech synthesis or identification should be formed for the phonemes"e" as in "heat", "o" as in "hop", "o" as in "ongaonga" or as in theDanish word "Ole", "u" as in the word "who", ".o slashed." as in theDanish word ".o slashed.se", and "y" as in the Danish word "lys",respectively. The pulses are repeated within a period of 5 ms.

From FIG. 20 it can be seen that the phoneme "i" as in "heat" could beformed by a very short pulse having a duration in the range of 0.3-1.1ms, with a rise time of the leading edge being in the range of 0.3-0.5ms. The fall time of the lagging edge should also be in the range of0.3-0.5 ms.

Similarly it is observed from FIG. 21 that the phoneme "o" as in "hop"could be formed by a pulse having a duration in the range of 1.3-1.8 ms,with a rise time of the leading edge being in the range of 0.3-0.5 ms.The fall time of the lagging edge should be in the range of 0.3-0.5 ms.

From FIG. 22 it is observed that the phoneme "o" as in the Danish word"Ole" could be formed by a pulse having a duration in the range of1.3-1.8 ms in the upper part of the pulse, with a rise time of theleading edge being in the range of 0.3-0.5 ms. The fall time of thelagging edge for this phoneme may vary, but should be in the range of1.0-2.0 ms.

From FIG. 23 it is observed that the phoneme "u" as in the word "who"could be formed by generating a transient pulse with a sine curveinterpolation and a duration in the range of 1.0-2.0 ms. The preferredduration should be about 1.5 ms.

FIG. 24 show the pulse of the phoneme ".o slashed." as in the Danishword ".o slashed.se". Here the leading edge may have a rise time in therange of 0.4-0.6 ms. The fall time of the lagging edge should be in therange 1.0-2.0 ms.

FIG. 25 show the pulse of the phoneme "y" as in the Danish word "lys".Here the leading edge may have a rise time in the range of 1.0-2.0 ms.The fall time of the lagging edge should also be in the range 1.0-2.0ms.

When synthesizing human speech in accordance with the above mentionedprinciples of the invention it is preferred to generate a series oftransient pulses corresponding to the series of phonemes whichconstitutes the speech to be synthesized. It is furthermore preferredthat the series of phonemes is established from a series of lettersusing rule-based conversion.

It should be understood that the principles of the invention also can beused for quality measurement of audio products. In such a measurement awell defined transient signal should be transmitted to the audioproduct, and the distorsion of the response can be measured. Thedistorsion may be measured by using a pre-process in accordance with theprinciples illustrated in FIG. 7.

The principles of the invention may also be used in hearing aids inorder to improve noise suppresion in speech signals.

A library of features representing characteristic shapes of thetransient pulses may be used for identifying the speech signal andseparate the speech signal from the noise background.

The experiments presented have, for the first time, shown some commonfeatures for phonemes which are very simple to recognize and generate,but which could be of great significance within the whole area ofrecognition and generation of speech or auditory signals.

The performance of the method and system of the present invention isdescribed in the time domaine. It is however to be understood that thetransient signals, components and/or pulses being described in the timedomaine also could be given a corresponding description in the frequencydomaine, which would naturally be within the scope of the invention.

It is also to be noted that the methods of signal processing describedabove could be performed either digitally, electronically by use ofanalog components, mechanically, or by any combination thereof. Suchmethods of processing would also be within the scope of the invention.

I claim:
 1. A method of using a shape of an abrupt energy change in anauditory signal for identifying or representing distinct sound pictureswhich can be perceived by an animal or human ear, said method comprisingthe step of:deriving, from the auditory signal, at most one transientsignal including transient pulses representing abrupt energy changes inthe auditory signal having a rise time of at most 2 ms.
 2. The methodaccording to claim 1, wherein the transient signal is derived from theauditory signal by envelope detection.
 3. The method according to claim1, wherein the distinct sound picture is a phoneme.
 4. A methodaccording to claim 1, wherein said method is used in speech recognition.5. A method according to claim 1, wherein said method is used in speechcompression.
 6. A method according to claim 1, used for synthesizinghuman speech, wherein a series of the transient pulses are generated,corresponding to a series of phonemes which constitute the human speechto be synthesized.
 7. A method according to claim 6, wherein the seriesof phonemes is established from a series of letters using rule-basedconversion.
 8. A method according to claim 1, used inquality-measurement of audio products, the audio products beingloudspeakers, hearing aids or telecommunication systems.
 9. A methodaccording to claim 1, used in quality-measurement of acoustic conditionsin a room or in an open environment.
 10. A method for identifying, in anauditory signal, energy changes which can be perceived by an animal orhuman ear as representing a distinct sound picture the method comprisingthe steps of:deriving, from the auditory signal, at most one transientsignal including transient pulses representing the abrupt energy changesin the auditory signal having a rise time of at most 2 ms, selecting adominant pulse of the transient pulses in the transient signal, andcomparing a shape of the dominant pulse with predetermined transientsignal pulses representing distinct sound pictures.
 11. A methodaccording to claim 10, wherein the shapes of the transient pulses areobtained by envelope detection of a transient response of an energychange in the auditory signal.
 12. A method according to claim 10,wherein the shape of a leading edge of the transient pulses isidentified.
 13. A method according to claim 12, wherein the shape of theleading edge is determined by determining a rise time and a slope and/orslope variation of at least part of the leading edge.
 14. A methodaccording to claim 13, wherein the rise time and the slope and/or slopevariation of at least a top part of the leading edge is determined. 15.A method according to claim 13, wherein the top part is a part beginningsubstantially at a point where the slope is maximum.
 16. A methodaccording to claim 15, wherein the rise time and the slope and/or slopevariation of the leading edge is determined based on at least 5 samples.17. A method according to claim 12, wherein the identification of theshape of the leading edge is performed by comparison with a library ofreferences.
 18. A method according claim 17, wherein the library ofreferences with which comparison is made is selected based on the risetime of the leading edge.
 19. A method according to claim 10, wherein aduration of the transient pulses is identified.
 20. A method accordingto claim 19, wherein the duration of the transient pulses is determinedas a distance from a leading edge to a lagging edge at a predeterminedamplitude.
 21. A method according to claim 20, wherein the predeterminedamplitude is an amplitude of at most 50% of a maximum amplitude of thepulse.
 22. A method according to claim 12 wherein the transient pulseswhich cannot be perceived by the animal ear are discarded fromidentification.
 23. A method according to claim 22, wherein a transientpulse having a leading edge with an amplitude of less than 50% of anamplitude of a preceding pulse and an onset time of less than 3.5 ms, isdisregarded.
 24. A method according to claim 12, wherein the shape of alagging edge of the transient pulses is identified.
 25. A methodaccording to claim 24, wherein the shape of the lagging edge isdetermined by determining a fall time and a slope and/or slope variationof at least part of the leading edge.
 26. A method according to claim12, wherein a time period between leading edges of the transient pulseswhich can be perceived by the animal ear is determined.
 27. A methodaccording to claim 26, wherein, prior to transmission of the processedsignal, the signal is coded into a digital representation, and the codedsignal is decoded in the receiver so as to reestablish transient pulseshapes perceived by the animal ear such as the human ear as representingthe distinct sound pictures of the auditory signal.
 28. A methodaccording to claim 10, used in speech recognition.
 29. A methodaccording to claim 5, used in speech compression.
 30. A method accordingto claim 10, used for synthesizing human speech, a series of thetransient pulses are generated, corresponding to a series of phonemeswhich constitute the human speech to be synthesized.
 31. A methodaccording to claim 30, wherein the series of phonemes is establishedfrom a series of letters using rule-based conversion.
 32. A methodaccording to claim 10, used in quality-measurement of audio products,the audio products being loudspeakers, hearing aids or telecommunicationsystems.
 33. A method according to claim 10, used in quality-measurementof acoustic conditions in a room or in an open environment.
 34. A systemfor processing an auditory signal to reduce the bandwidth of theauditory signal with substantial retention of information of theauditory signal, comprising the steps of:means for extracting thetransient component corresponding to an abrupt energy change of theauditory signal, and means for detecting an envelope of the transientcomponent, said means for detecting the envelope, deriving from thetransient component, at most one transient signal including transientpulses representing abrupt energy changes having a rise time of at most2 ms.
 35. A system according to claim 34, further comprising means foridentifying or representing the abrupt energy changes based on a shapeof the transient pulses.
 36. A system according to claim 34, whereinsaid means for extracting includes a bandpass filter or a highpassfilter.
 37. A system according to claim 36, wherein a lower cutofffrequency of the bandpass or highpass filter is at least 2 kHz.
 38. Asystem according to claim 36, wherein an upper cutoff frequency of thebandpass filter is in a range between 4.5 and 7 kHz.
 39. A systemaccording to claim 34, wherein said means for detecting includes arectifier and a lowpass filter.
 40. A system according to claim 39,wherein the rectifier is a one-way rectifier.
 41. A system according toclaim 39, wherein a cutoff frequency of the lowpass filter is in a rangeof 400-1000 Hz.
 42. A system according to claim 34, wherein said meansfor detecting includes a filter bank.
 43. A method for processing anauditory signal to reduce a bandwidth of the auditory signal withsubstantial retention of information of the auditory signal, comprisingthe steps of:extracting a transient component corresponding to an abruptenergy change of the auditory signal; and detecting an envelope of thetransient component to obtain, from the transient component, at most onetransient signal including transient pulses representing abrupt energychanges having rise times of at most 2 ms.
 44. A method according toclaim 43, wherein transient pulses of the auditory signal, which can beperceived by an animal or human ear, as representing a distinct soundpicture, are identified.
 45. A method according to claim 44, wherein thedistinct sound picture is a phoneme.
 46. A method according to claim 44,wherein a shape of a leading edge of the transient pulses is identified.47. A method according to claim 46, wherein the shape of the leadingedge is determined by determining a rise time and a slope and/or slopevariation of at least part of the leading edge.
 48. A method accordingto claim 47, wherein the rise time and the slope and/or slope variationof at least a top part of the leading edge is determined.
 49. A methodaccording to claim 48, wherein the top part is a part beginningsubstantially at a point where the slope is maximum.
 50. A methodaccording to claim 47, wherein the rise time and the slope and/or slopevariation of the leading edge is determined based on at least 5 samples.51. A method according to claim 46, wherein the identification of theshape of the leading edge is performed by comparison with a library ofreferences.
 52. A method according to claim 51, wherein the library ofreferences with which comparison is made are selected based on the risetime of the leading edge.
 53. A method according to claim 46, whereinthe transient pulses which cannot be perceived by the animal ear arediscarded from identification.
 54. A method according to claim 53,wherein a transient pulse having a leading edge with an amplitude ofless than 50% of an amplitude of a preceding pulse and an onset time ofless than 3.5 ms, is disregarded.
 55. A method according to claim 46,wherein the shape of a lagging edge of the transient pulses isidentified.
 56. A method according to claim 55, wherein the shape of thelagging edge is determined by determining a fall time and a slope and/orslope variation of at least part of the leading edge.
 57. A methodaccording to claim 46, wherein a time period between leading edges ofthe transient pulses, which can be perceived by the animal ear, isdetermined.
 58. A method according to claim 57, wherein the time periodbetween leading edges which have a distance of at least 3 ms from eachother, is determined.
 59. A method according to claim 44, wherein aduration of the transient pulses is identified.
 60. A method accordingto claim 59, wherein the duration of the transient pulses is determinedas a distance from a leading edge to a lagging edge at a predeterminedamplitude.
 61. A method according to claim 60, wherein the predeterminedamplitude is an amplitude of at most 50% of a maximum amplitude of thetransient pulses.
 62. A method according to claim 43, further comprisingthe steps of:transmitting the transient signal, and receiving thetransient signal with a receiver, to thereby telecommunicate theauditory signal.
 63. A method according to claim 62, wherein, prior totransmission of the transient signal, the transient signal is digitallycoded, and the coded signal is decoded in the receiver so as toreestablish the transient pulses perceived by the animal or human ear,as representing distinct sound pictures of the auditory signal.
 64. Amethod according to claim 63, wherein transmission of the digitallycoded transient signal is performed at a bandwidth of at most 4000 bitsper second.
 65. A method according to claim 64, wherein the bandwidth isat most 2000 bits per second.
 66. A method according to claim 65,wherein the bandwidth is in an interval of 800-2000 bits per second. 67.A method according to claim 63, wherein the digitally coded transientsignal includes information about a leading edge, a lagging edge, and aduration of each transient pulse.
 68. A method according to claim 63,wherein a second and subsequent pulses in a sequence of identical pulsesare represented by a digital sign indicating repetition.
 69. A methodaccording to claim 43, wherein the transient component is extracted bybandpass filtering or highpass filtering.
 70. A method according toclaim 69, wherein a lower cutoff frequency of the bandpass or highpassfiltering is at least 2 kHz.
 71. A method according to claim 69, whereinan upper cutoff frequency is in a range between 4.5 and 7 kHz.
 72. Amethod according to claim 43, wherein the envelope is detected byrectification and lowpass filtering.
 73. A method according to claim 72,wherein the envelope is detected by one-way rectification.
 74. A methodaccording to claim 72, wherein a cutoff frequency of the lowpassfiltering is in a range of 400-1000 Hz.
 75. A method according to claim43, wherein the envelope is detected by bandpass filtering, using a bankof filters.
 76. A method according to claim 43, used in speechrecognition.
 77. A method of identifying or representing a phoneme "e"as in "heat", comprising the steps of:deriving at most one transientsignal corresponding to abrupt energy changes of the phoneme; andidentifying or generating a transient pulse in the transient signal witha rise time of a leading edge of at most 0.5 ms and a duration of atmost 1.1 ms.
 78. A method according to claim 77, wherein the rise timeof the leading edge is at most 0.4 ms.
 79. A method according to claim77, wherein a duration is at most 1.0 ms.
 80. A method according toclaim 77, wherein a fall time of a lagging edge is at most 0.3 ms.
 81. Amethod according to claim 77, wherein said method is used in speechrecognition.
 82. A method according to claim 77, used in speechcompression.
 83. A method according to claim 77, used for synthesizinghuman speech, further comprising the step of:replicating the transientpulse to provide a series of the transient Pulses corresponding to aseries of phonemes wherein the series of the transient pulses aregenerated corresponding to the series of phonemes which constitute thehuman speech to be synthesized.
 84. A method according to claim 83,wherein the series of phonemes is established from a series of lettersusing rule-based conversion.
 85. A method according to claim 77, used inquality-measurement of audio products, the audio products beingloudspeakers, hearing aids or telecommunication systems.
 86. A methodaccording to claim 77, used in quality-measurement of acousticconditions in a room or in an open environment.
 87. A method ofidentifying or representing a phoneme "o" as in an English word"ongaonga" or a Danish word "Ole", comprising the steps of:deriving atmost one transient signal corresponding to abrupt energy changes of thephoneme; and identifying or generating a transient pulse in thetransient signal with a rise time of a leading edge of at most 0.5 msand a duration of 1.3-1.8 ms.
 88. A method according to claim 87,wherein the rise time of the leading edge is at most 0.4 ms.
 89. Amethod of identifying or representing a phoneme "u" as in an Englishword "who", comprising the steps of:deriving at most one transientsignal corresponding to abrupt energy changes of the phoneme; andidentifying or generating a transient pulse in the transient signal witha sine curve interpolation and a duration of 1.0-2.0 ms.